A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

نویسندگان

Pedro A. Ortega

Daniel A. Braun

چکیده

Adaptive control problems are notoriously difficult to solve even in the presence of plantspecific controllers. One way to by-pass the intractable computation of the optimal policy is to restate the adaptive control as the minimization of the relative entropy of a controller that ignores the true plant dynamics from an informed controller. The solution is given by the Bayesian control rule— a set of equations characterizing a stochastic adaptive controller for the class of possible plant dynamics. Here, the Bayesian control rule is applied to derive BCR-MDP, a controller to solve undiscounted Markov decision processes with finite state and action spaces and unknown dynamics. In particular, we derive a non-parametric conjugate prior distribution over the policy space that encapsulates the agent’s whole relevant history and we present a Gibbs sampler to draw random policies from this distribution. Preliminary results show that BCR-MDP successfully avoids sub-optimal limit cycles due to its built-in mechanism to balance exploration versus exploitation.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain

 In this paper we study the relative entropy rate between a homogeneous Markov chain and a hidden Markov chain defined by observing the output of a discrete stochastic channel whose input is the finite state space homogeneous stationary Markov chain. For this purpose, we obtain the relative entropy between two finite subsequences of above mentioned chains with the help of the definition of...

متن کامل

ADK Entropy and ADK Entropy Rate in Irreducible- Aperiodic Markov Chain and Gaussian Processes

In this paper, the two parameter ADK entropy, as a generalized of Re'nyi entropy, is considered and some properties of it, are investigated. We will see that the ADK entropy for continuous random variables is invariant under a location and is not invariant under a scale transformation of the random variable. Furthermore, the joint ADK entropy, conditional ADK entropy, and chain rule of this ent...

متن کامل

Markov Decision Processes and Stochastic Games with Total Effective Payoff a

We consider finite Markov decision processes (MDPs) with undiscounted total effective payoff. We show that there exist uniformly optimal pure stationary strategies that can be computed by solving a polynomial number of linear programs. We apply this result to two-player zero-sum stochastic games with perfect information and undiscounted total effective payoff, and derive the existence of a sadd...

متن کامل

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes

We analyze losses resulting from uncertain transition probabilities in Markov decision processes with bounded nonnegative rewards. We assume that policies are pre-computed using exact dynamic programming with the estimated transition probabilities, but the system evolves according to different, true transition probabilities. Our approach analyzes the growth of errors incurred by stepping backwa...

متن کامل

Markov Decision Processes and Stochastic Games with Total Effective Payoff

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1002.1480 شماره

صفحات -

تاریخ انتشار 2010

A Minimum Relative Entropy Controller for Undiscounted Markov Decision Processes

نویسندگان

چکیده

منابع مشابه

Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain

ADK Entropy and ADK Entropy Rate in Irreducible- Aperiodic Markov Chain and Gaussian Processes

Markov Decision Processes and Stochastic Games with Total Effective Payoff a

Loss Bounds for Uncertain Transition Probabilities in Markov Decision Processes

Markov Decision Processes and Stochastic Games with Total Effective Payoff

عنوان ژورنال:

اشتراک گذاری